The ability to train large-scale neural networks has resulted instate-of-the-art performance in many areas of computer vision. These resultshave largely come from computational break throughs of two forms: modelparallelism, e.g. GPU accelerated training, which has seen quick adoption incomputer vision circles, and data parallelism, e.g. A-SGD, whose large scalehas been used mostly in industry. We report early experiments with a systemthat makes use of both model parallelism and data parallelism, we call GPUA-SGD. We show using GPU A-SGD it is possible to speed up training of largeconvolutional neural networks useful for computer vision. We believe GPU A-SGDwill make it possible to train larger networks on larger training sets in areasonable amount of time.
展开▼